Dynamic language modeling for European Portuguese
نویسندگان
چکیده
This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of outof-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. 2010 Elsevier Ltd. All rights reserved.
منابع مشابه
Dynamic Language Modeling for the European Portuguese
Up-to-date language modeling is recognized to be a critical aspect of maintaining the level of performance for a speech recognizer over time for most applications. In particular for applications such as transcription of broadcast news and conversations where the occurrence of new words is very frequent, especially for highly inflected languages like the European Portuguese. An unsupervised adap...
متن کاملThe Presence and Influence of English in the Portuguese Financial Media
As the lingua franca of the 21st century, English has become the main language for intercultural communication for those wanting to embrace globalization. In Portugal, it is the second language of most public and private domains influencing its culture and discourses. Language contact situations transform languages by the incorporations they make from other languages and Portugal has...
متن کاملA parametric study of the spectral characteristics of European Portuguese fricatives
Studies of Portuguese phonetics and phonology indicate that fricatives are central to some interesting features of the language, yet studies of Portuguese fricatives have been few and limited. In this study, Portuguese fricatives were analyzed in ways designed to enhance our description of the language and to increase our understanding of the production of fricatives. Corpora of Portuguese word...
متن کاملExploiting variety-dependent Phones in Portuguese Variety Identification
This paper presents a new approach of building a language identification system using a specialized Phone Recognition system followed by Language Modeling (PRLM) to differentiate Portuguese varieties spoken in African Countries from European Portuguese. The system is designed to focus on exploiting the phonotactic information of a single discriminatively trained tokenizer for the specific pair ...
متن کاملAutomatic Speech Recognition and Identification of African Portuguese
This document deals with speech recognition of different Portuguese varieties, it resumes results from the author’s diploma thesis [9]. The performance of a hybrid large vocabulary continuous speech recognizer, which combines multi-layer perceptrons and Hidden Markov Models, degrades heavily in the presence of African Portuguese varieties in broadcast news. Variety-specific acoustic and languag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 24 شماره
صفحات -
تاریخ انتشار 2010